Skip to content

Add Reactome user guide Q&A with intent-based routing.#166

Open
heliamoh wants to merge 4 commits into
mainfrom
feature/userguide-qa
Open

Add Reactome user guide Q&A with intent-based routing.#166
heliamoh wants to merge 4 commits into
mainfrom
feature/userguide-qa

Conversation

@heliamoh

Copy link
Copy Markdown
Collaborator
  • Add user guide ingestion pipeline and Chroma embeddings via embeddings_manager
  • Add user guide RAG and intent-based routing in React-to-Me
  • Update safety/rephrase for how-to questions; fix citation URLs from context

Examples

Screenshot 2026-06-25 at 6 09 24 PM Screenshot 2026-06-25 at 6 09 35 PM

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new “Reactome user guide” knowledge source to the chatbot, including an ingestion/embedding pipeline and an intent router so UI/how-to questions can be answered from Reactome’s website documentation rather than the biological knowledgebase.

Changes:

  • Introduces user guide HTML fetching + section chunking to generate Chroma embeddings via embeddings_manager.
  • Adds a user guide RAG chain (with citation-oriented prompting) and an intent classifier to route queries between reactome vs userguide.
  • Updates safety + rephrase prompts to treat Reactome UI/how-to questions as in-scope, and updates docs/dependencies accordingly.

Reviewed changes

Copilot reviewed 15 out of 17 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/retrievers/userguide/retriever.py Creates a Chroma-backed retriever for user guide section embeddings.
src/retrievers/userguide/rag.py Builds a user guide RAG chain wired to the user guide retriever + prompt.
src/retrievers/userguide/prompt.py Adds a citation-strict prompt for user guide Q&A.
src/retrievers/rag_chain.py Extends the shared RAG chain factory to accept an optional document_prompt.
src/data_generation/userguide/urls.py Defines canonical Reactome user guide URLs to ingest.
src/data_generation/userguide/html_loader.py Loads cached HTML and splits pages into section-level Documents for embedding.
src/data_generation/userguide/fetch.py Downloads user guide HTML pages with a cache and polite request pacing.
src/data_generation/userguide/init.py Implements user guide embedding generation and Chroma persistence.
src/agent/tasks/safety_checker.py Expands “relevance” to include Reactome UI/how-to questions.
src/agent/tasks/rephrase.py Prevents UI/how-to questions from being rewritten into biology/mechanism questions.
src/agent/tasks/intent_classifier.py Adds an LLM-based classifier to route queries to reactome vs userguide.
src/agent/profiles/react_to_me.py Integrates intent routing and dynamically enables user guide RAG when embeddings exist.
README.md Documents date-based versioning for user guide embeddings.
pyproject.toml Adds HTML parsing dependencies for ingestion.
poetry.lock Locks new dependencies and their transitive requirements.
docs/embeddings_manager.md Documents embeddings_manager make .../userguide/<YYYY-MM> usage and --force.
bin/embeddings_manager Adds userguide target and ensures embeddings archive directory exists before pulling.
Comments suppressed due to low confidence (1)

src/agent/profiles/react_to_me.py:158

  • generate_answer indexes state["chat_history"], but the initial graph invocation passes only user_input (AgentGraph.ainvoke uses InputState(user_input=...)), so chat_history may be absent on the first turn and this can raise a KeyError. Use state.get("chat_history") with an empty-list/seed-message fallback instead.
                    state["chat_history"]
                    if state["chat_history"]
                    else [HumanMessage(state["user_input"])]
                ),
            },

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pyproject.toml
Comment thread src/agent/tasks/intent_classifier.py Outdated
Comment thread src/data_generation/userguide/html_loader.py
Comment thread src/retrievers/userguide/rag.py Outdated
@heliamoh heliamoh requested review from GFJHogue and removed request for GFJHogue June 25, 2026 22:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants